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1. Introduction 

Spins or harmonic oscillators on a lattice form a class of models which have been 
studied intensively in statistical physics. Understanding them is the key to many 
problems in condensed matter systems, especially regarding magnetic phenomena but 
also electrical and heat conduction and many other aspects. As the importance of 
quantum phase transitions [Sac99, Voj03] has been more and more realized, interest in 
the ground states of quantum spin models grew. While the relevance of entanglement 
for quantum phase transitions was initially not fully appreciated, it is now a vivid area 
of research (e. g., [OAFF02, VLRK03]), and many researchers feel that paying explicit 
attention to entanglement features is vital for further progress in numerical methods 
for the treatment of spin models [ON02, VPC04, Lat07]. Although quantum phase 
transitions nominally only occur at zero temperature, their presence has great influence 
on the system properties at finite temperature, namely leading to the break-down of 
quasiparticle descriptions. Hence, studying the ground state of spin models holds 
promises to understand experimentally observed features of such systems, not the 
least of which is high-temperature superconductivity. Finally, spin models (including 
bosons on a lattice) are an ideal way to model optical lattices, which are currently 
researched with exciting successes in theory and experiment (reviewed in [LSA + 07]). 

While there are some exactly solvable spin models in one spatial dimension 
[Tak99], for nearly all models in higher dimensions approximative techniques have 
to be used. A variety of quite different techniques have been developed: Most 
prominently, these are quantum Monte Carlo techniques, where recent progress has 
been acieved especially in the context of the so-called world-line Monte Carlo methods 
([ELM93, PST98], reviewed in [KH04]). For one-dimensional systems, extraordinary 
accuracy has become possible with the density matrix renormalisation group (DMRG) 
algorithm ([Whi92, Whi93], review [Sch05]). Recently, this algorithm was extended 
to allow the calculation of not only ground state properties but also of thermal states 
[ZV04, VGC04] and time evolutions [LXW03, MMN05, Vid04, DKSV04] . Also, an 
extension to higher spatial dimensions has been proposed [VC04a]. Its usability in 
practice has been demonstrated only very recently [MVC07]. 

All these variations of DMRG are based on the same class of variationalf states, 
namely matrix product states [R097]. We have recently found [APD+06] that another 
class of states, namely the so-called weighted graph states (WGS), first studied 
in different context in [DHH + 05, HCDB05], is also quite promising as ansatz for 
variational approximation of ground states of spin systems. Its particular advantage 
is the unlimited amount of entanglement that can be present. Hence, we consider 
our technique as especially promising for systems with long-range entanglement such 
as critical systems. A further key difference of our states to matrix product states 
is that their mathematical structure does not reflect any spatial geometry (while 
the product of matrices in a matrix product state reflects a chain or ring geometry, 
as studied especially in [R097, VPC04]) and hence may be expected to be equally 
suitable for higher dimensions (2D or 3D) as for ID. Hence, even though we probably 
cannot compete with the astounding accuracy of DMRG in ID, we aim to provide a 
complementary alternative to the higher-dimension generalisations of DMRG [VC04a] . 
In [APD+06], we presented this technique and demonstrated its use for simple spin-1/2 
systems in one and two dimensions. In the present article we explain our method in 

fStrictly speaking, only the fixed-length phase of DMRG can be called a variational method. 
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much more detail, show new results we have obtained since then (especially regarding 
the treatment of spins higher than spin- 1/2, and concerning heuristics to perform the 
minimizations) and tests its usefuleness on practical examples. The article is self- 
contained and docs not assume the reader's familiarity with weighted graph states or 
the content of [APD+06]. 

This article is organised as follows: We start in Sec. 2 by reviewing some general 
observations about variational methods. In Sec. 3 wc describe our class of variational 
states as a generalisation of weighted graph states and discuss their paramctrisation. 
Section 4 explains how reduced density matrices of these states are calculated in an 
efficient manner in order to be able to evaluate expectation values of observables, 
including energy. To test our method, we show results for calculations on two 
different models (namely the XY model and the Bose-Hubbard model) in Sec. 5. 
In a variational method, a crucial part is finding a state within the given class that 
minimises the energy as well as possible. Our techniques for doing so are the topic 
of Sec. 6. Wc add some further notes on the details of our numerical implementation 
and its performance (Sec. Appendix A), and finish with a conclusion and an outlook 
on further work (Sec. 7) 



2. General considerations on variation 



For a Hamiltonian H that is too large to diagonalise one can approximate the ground 
state using the Rayleigh-Ritz variational method. One uses a family of states |^(x)) 
which depend on some parameter x. It may be better to see this as a map from a 
parameter space R K to a Hilbert space H: 

* : R K -> H, x^ |*(x)) . 

One then solves the minimisation problem 

. . (*(x)|iJ|*(x)) 
E min = min£(x); with £(x) = (1) 

in order to obtain an upper bound -Emm to the ground state energy and an 
approximation ^(xmm)) for the ground state. 

For this to give good results, the map iff has to fulfil the following conditions: 

(i) There must be an efficient algorithm to calculate the expectation value of 
observables for any state ^(x). In principle, it is sufficient to be able to calculate 
(vf^x) | H | ^(x)) and (^(x) | ^(x)), but if one wants not only to bound the ground 
state energy, but also analyse the ground state approximant |^(x m i n )), it is desirable 
to be able to calculate expectation values for other observables, too. 

"Efficient" means here a computation time at most polynomial in the number of 
parameters K. As the dimension of the Hilbert space H typically scales exponentially 
in the number N of constituents of the system, we want K to not do the same. Thus, 
the map iff, considered as a a family "Jjv of maps for different system sizes N, should 
be such that the dimension K of its domain scales only polynomially with N , and 
thus logarithmically with dimW. 

(ii) There should be reason to expect that there are states ^(x)) within the range 
of the map iff that have large overlap with the true ground state or at least an energy 
near to the true ground state energy. As the range of iff is a sub-manifold of Ti. of 
dimension at most K <C dim7i, this requires it either to be folded and twisted in a 
quite peculiar way to reach many different regions of Ti., or to happen to occupy the 
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same small part of H. as the ground state. Typically, it is not possible to prove such a 
statement, and one hence has to do with heuristic arguments or numerical evidence. 

(iii) There should be reason to expect that the minimisation programme (1) 
succeeds in finding a good minimum and does not get stuck in a bad local minimum. 
It is often not justified to hope to find the global minimum, but a local minimum of 
an energy only slightly higher than that of the global minimum is hardly worse. 

Whether the minimisation can succeed depends on the "energy landscape" , 
i.e. the graph of -E(x). If this landscape has many local minima, a naive multi- 
start optimisation cannot succeed. Often, the number of local minima increases 
exponentially with TV or K, which may render a method that is efficient for small 
systems useless for larger ones. Hence, one usually has to succeed in tailoring a 
heuristics that helps to find good minima for the specific kind of energy landscape one 
has to deal with. 

One of the best studied variational methods is finite-length DMRG, and we 
shall illustrate the conditions given above by briefly discussing how DMRG (in the 
formulation of Ref. [VPC04]) fulfils them. For DMRG, the class of variational 
states are the matrix product states [OR95, R097]. For an TV-site matrix product 
state, an efficient algorithm exists to evaluate the expectation value of any observable 
that can be written as a sum of tensor products of local operators in time linear 
in TV. This meets condition (i). The expectation that a matrix product state is 
a good approximant for the ground state of a generic ID system (condition (ii)) is 
the very rationale that led White and Noack to their idea of keeping the lowest- 
lying eigenstates not of the short-range Hamiltonian but of the corresponding density 
matrix as explained e. g. in [Whi98]. The fact that condition (iii) is fulfilled, i. e. that 
the "sweeping procedure" of finite-length DMRG does not get stuck in local minima 
is somewhat mysterious, especially in the light of the possibility of construction of 
Hamiltonian for which this cannot be avoided [Eis06]. Nevertheless, the construction 
principle, as exposed in [VPC04], shows that the matrix for each site has direct 
influence only on this site and its neighbours, i. e. matrix product states allow for 
an essentially local description of states despite the existence of significant amount 
of entanglement. Hence, is seems natural that — barring "pathological" cases such as 
those discussed in [Eis06] — the local variation of matrices during sweeps allows for a 
good minimisation, provided the initial TV-site state was chosen well (which is the task 
of the so-called "warm-up", which uses infinite- length DMRG). Furthermore, Wolf et 
al. have recently shown a close connection between approximability and Renyi entropy 
for matrix product states [SWVC07]. 

We shall come back to some of these points when comparing our variational states 
with matrix product states at the end of Sec. 3. 

3. The class of variational states 

3.1. Basic idea 

Our class of quantum states derives from the so-called weighted graph states, which 
were introduced in [DHH+05, RBB03] and also used in [HCDB05]. They are a 
generalisation of graph states (introduced in [BR01], sec [HDE + 06] for a review). 
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For a Hilbert space of N qubits, they are defined asf 



N N 



i r >=i : ww\+) 



(2) 



a— 1 b—a+1 



where a product of phase gates W v is applied onto a tensor product of |+) = 
(|0) + \l))/y/2 states. These phase gates are two-qubit operation, diagonal in the 
computational basis, and of the form \ 



It may help to see the effect of W on small states. This is, eg., a three-qubit 
weighted graph state (where the qubits are numbered 1, 2, 3 from left to right): 



For every pair a, b of spins, there is a phase gate with a phase ip a b = ifba- The key 
observation and starting point of the work in [DHH+05] is that even for very large N, 
we can efficiently calculate any reduced density matrix for a subset Ac {1,2,..., N} 
of the qubits as long as the number of qubits in A (i. e. the number of qubits not traced 
over) is low. This calculation is efficient in the sense that the time requirement scales 
only polynomially in N (though exponentially in \A\). This is remarkable because in 
the generic case, the time to calculate a reduced density matrix is exponential in TV, 
and most classes of states which allow for calculation of reduced density matrices in 
polynomial times are bounded in the amount of entanglement that they can contain. 
Especially in the case of matrix product states, this fact is the dominant reason why 
DMRG cannot be applied successfully for certain settings [VPC04]. Weighted graph 
states, on the other hand, are not bounded in the amount of their entanglement, as 
shall be explained in Sec. 3.3. 

There is no guarantee that these states spread through those parts of the Hilbert 
space which are of interest for us, and hence, we add as many further degrees of 
freedom to the form (2) as possible without losing the ability to efficiently calculate 
reduced density matrices. As will be demonstrated in Sec. 4, the following additions 
do not hinder the efficiency of the reduced density matrix evaluation: (i) Let the phase 
gates act not simply on |+) 8>JV , but on any /V-qubit product state, (ii) Even weighted 
superpositions of m product states can be treated, provided m is small, (hi) After the 
phase gates, arbitrary local unitaries may be applied. 

3.2. Parametrisation 

Deviating from the treatment in [APD + 06], we develop the formulae not just for spin- 
1/2 particles but, more generally, for n-level systems, i.e. our states live in a Hilbert 



fin this article, superscripts in parentheses always indicate the spins an operator acts on. Hence 
is an operator defined on a 2-spin space, while W^ ab ' 1 is defined on the full TV-spin Hilbert space, 
but has support only on spins a and b. 

Hn [DHH+05, HCDB05], the notation U Vah is used instead of . Here, we use the W to 

emphasise that it is a specific, and not some general unitary. Note also the absence of a minus sign 
in the exponential e 1 ^, which differs from the convention used in [DHH+05]. 




(3) 



WWWWWW |+ + +) = -^=(|000> + |100) + |010) + |110) + 

+ |001) + e iipi3 1 101) + e^ 23 |011) + e^is+^+^a) | U1 ) ) 



space H = (C n )® N . 
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3.2.1. Superposition of product states We start with a superposition of m product 
states, which we write 

m N 

nrm E a i (l°> + <i I 1 ) + <2 |2> + . . . + < n _ x |n - 1)) 

j=l a=l 

m N 

= nrm^a,-0^4 s | S ). (4) 

j = l =1 s6S 

The operator nrm denotes normalisation: nrm|^) := \ip) /\\ ||. To facilitate 
notation, we also introduced 

V:={1,2,...,AT} (set of spins) 

S := {0, 1, . . . , n - 1} (set of levels) 

As we normalise afterwards, we can fix the coefficient in front of |0) to 1: d 3 a = 1 for 
all a, j. It will also be useful later to introduce the deformation operators 

Dd ■= E ds I s ) ( s l ' witl1 d = ( d o, 4, • • • i dn-i) 

ses 

and the n-level |+) state 

|+) :=-)=El s >' 
such that the state (4) can now be written in the forms 

nrm ((g) D d A \+f N = nrm E ( II J I s ) ( 5 ) 

Vs£S / sGS™ \cEV ) 



3.2.2. Phase gate We entangle these product states by applying onto each pair a, b 
of spins a generalisation W<& of the 2-level phase gate W v from Eq. (3). We want to 
define W<s> as general as possible, but have to meet three constraints: (i) All W$ have 
to commute (because otherwise the calculation of reduced density matrices explained 
later in Sec. 4 does not work). Hence, they have to be diagonal, (ii) W$ has to 
be unitary (for the same reason). Hence, the entries in its diagonal have to be pure 
phases, (iii) W<& should not have any parameters which can be absorbed without loss 
of generality into the d J a8 . To see, which these are, let us look at the example of n = 3: 




/ $ 00 A>7o 


\ 




( Coi 


\ 


$ 01 Poll 






Co2 




$ 02 A)72 






Co3 




$ 10 /3i7o 






ClO 




$ n /3 l7l 






Cn 




$ 12 /3 l72 






Cl2 




$ 20 /3 2 7o 






C20 




$ 21 &7i 






C21 




V $ 22 /?272 


) 




C22 


) 



If one is given the £st and can choose the (3 S and 74 at will, one does not need the 
freedom to set all entries in W<s>. It suffices to have 4 phases: 



Wg, =diag (1,1,1, l,e h 



1,0*' 



)• 



(6) 
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In general, for n levels, one needs to specify (n — l) 2 phases for each phase gate W§. 
We denote the phases by a (n — 1) x (n — 1) matrix $ a (, (with elements $** b ) and have 

71-1 

W# = l„ x „ © diag (l, c 1 * 31 , e 1 * 32 , . . . , e 1 * 31 ""^ 



s=l 



Defining $ sU = and <I> = for all s, t S S, we can simply write 

= e ci * 3t I st ) ( st \ ( 7 ) 

Our variational states now take the following form: 

(N \ m / \ AT 

^ £ ^ II w£? £ I s ) ■ ( g ) 
a=l / j = l I a, bey / a=l s£S 

V a<fc / 

The vector x is a concatenation of all the parameters that are present in the right- 
hand side, i. e. the (real) parameters of x contain the real and imaginary parts of 
the complex scalars d 3 as and otj, the (real) entries $^* fc of the phase matrices, and the 
parameters describing the N local unitaries U a £ SU(n), a = 1, . . . , N. 



3.2.3. Parametrisation of the unitaries Next, we need to choose a parametrisation of 
SU (n) in order to describe the unitary matrices U . For this, we use an isomorphism 
between the set SU(n) of unitary n x n matrices U and the set of Hcrmitian n x n 
matrices A because Hermitian matrices are easy to parametrise. We could use (a) the 
matrix exponentiation U = expL4 or (b) the Cayley transform (introduced 1846 by 
Cayley, see e.g. [Puz05]) 

U= + A)" 1 . (9) 

To calculate these expressions numerically, we need, for (a), a matrix diagonalisation 
and, for the matrix invertion in (b), an LU factorisation [TB97]. We choose the Cayley 
transform, not only because it is slightly faster, but especially because we will later 
have to evaluate the derivatives of U with respect to its parameters, and while this is 
very involved for (a) [NH95] , it is rather trivial for (b) [P1M] . (A disadvantage seems to 
be on the first glance that the Cayley transform is undefined if A has -1 as eigenvalue, 
because then, (il — ^4) cannot be inverted. The algorithm will not, however, converge 
to this case, and if it happened to hit on it, the program would abort.) 



3.2.4- Parameter count Let us now count the number K of real parameters needed 
to describe a state |^(x)): 

• For each phase gate, we need (n — l) 2 real numbers. In case of one phase matrix 
for each pair of spins, there are N(N — l)/2 gates. 

• For the deformations, i. e., the specification of the initial product states, we need 
2mN(n — 1) real numbers. 

• For the superposition coefficients, 2m reals. 

• An n x n Hermitian matrix is specified by n(n — 1)/2 complex entries in one of the 
triangles above or below the diagonal and n real entries in the diagonal. Hence, 
we need for the TV unitaries a total of Nn(n + l)/2 real parameters. 
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Thus, the number of parameters is 




+ 2mN(n - 1) + 2m + N 



n(n + 1) 



2 



= 0(N 2 n 2 + Nnm). 



(10) 



3. 3. Entanglement properties 

As already mentioned an important motivation for this work was the goal to find a class 
of states which exhibit strong entanglement over arbitrary distances that is somewhat 
"generic" . After all, the limited ability to describe such entanglement is a common 
shortcoming of many approximation methods for many-body quantum mechanics. For 
the case of DMRG, this has been studied in detail in Ref. [VPC04]. There, it was 
shown that the matrix product states that arise during DMRG can be understood as 
"projections" from an auxilliary linear quantum system of the valence bond solid type 
[VC04b]. Hence, whenever one cuts the matrix product states "chain" into two parts, 
the blockwise entanglement (i. e., the entropy of the reduced density matrix of one of 
either part) is bounded by 21og 2 D, where D is the dimension of the auxilliary spins, 
which is equal to the number of "kept states" in DMRG parlance or the matrix size 
in the matrix product state picture. This explains why DMRG performs not too well 
when applied to long ID systems with long-range entanglement or, more precisely, to 
systems where the blockwise entanglement grows with the block size. 

A scaling of the entanglement is hardly avoidable when treating systems with 
more than one dimension. According to the various "area law" theorems and 
conjectures, for most systems the entanglement of a block versus the rest of the 
system scales linearly with the area of the interface between this block and the rest 
[AEPW02, PEDC05, Wol06, CEPD06, WVHC07] . Hence, for, say, a 2D system, the 
entanglement scales linearily with the surface area of the block and matrix product 
states are unable to render this feature without their matrix size growing quite fast. 
There are ways of replacing the matrices with higher-rank tensors to keep up with 
the area law, yielding so-called projected entangled pair states (PEPSs) [VC04a] but 
the formalism of these is rather tedious and grows more complicated with increasing 
spatial dimension. Also, PEPSs cannot go beyond the area law and are hence still 
unable to treat systems that do not follow the area law, i.e., show entanglement that 
scales superlinearly with the block surface, which typically is the case in critical and 
certain disordered systems [Kor04, KM05, CEP07, VWPC06, EO06, BCS06]. 

We hope that our variational method turns out to be a viable complementary 
method especially to this "PEPS" generalization of DMRG. To see how this claim 
may be substantiated, note that in the description of our states, the geometry of the 
system has not entered yet. Every spin is connected to every other spin by a phase 
gate, and we can thus modell any geometry, i. e., any scheme of neighboring relations. 
The entanglement of a block of M spins w. r. t. the rest of the system (with TV — A I 
spins) can scale with the number of spins M, i.e. with the volume and not with the 
surface area of the block [CHDB05]. Thus, the blockwise entanglement can reach 
the maximum value that is possible in the given Hilbcrt space. Other entanglement 
measures such as localizable entanglement between pairs of spins and also two-point 
correlation functions can reach their maximum value (independent of the distance), 
but can also show exponential or polynomial decay [DHH + 05]. This is already evident 
from the fact that 2D cluster states are within our variational class, and they reach 
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Figure 1. For a system on an L X L square lattice with periodic boundary 
conditions and a system Hamiltonian that is invariant under the lattice's 
symmetry group, only R = O(N) phase matrices are needed. The numbers 
indicate the numbering of these matrices with phase indices v = 0, . . . , R — 1. 
The circles denote sites of a 6 X 6 lattice. In constructing a variational state (8) 
on this lattice, phase gates W$ are performed on any pair of sites, where the 
phase gate acting on the purple shaded site and a site marked with the number v 
uses the phase matrix $„■ Translation of these markings show the phase indices 
for other site pairs. Note, how due to the rotation and reflection symmetries of 
the square lattice, the pattern of phase indices is repeated eight times. 



maximum entanglement in several senses [NMDB06], e. g. the localizable entanglement 
between all pairs of spins is one. 

3.4- Making use of symmetries 

S.^.l. Symmetrising the phases The quadratic scaling of K with the number N of 
spins (lattice sites) in Eq. (10) can be reduced to a linear scaling in case of a system 
Hamiltonian with translational symmetry. This is because in this case it is reasonable 
to assume that we do not lose precision if we let the phase matrices depend not on 
the absolute positions of the spins a and b but only on the position of b relative to a. 
More precisely, we introduce a mapping v :V XV — ► {!.,..., R}, that gives the phase 
index for the spin pair a, b: the phase gate that is applied on the pair (a, b) shall be 
the phase matrix with number v(a,b), and R is the total number of phase matrices. 
The 4th-order tensor $^ r2 now becomes a 3rd-order tensor § r J^b) ■ 

The mapping v has to be constructed such that two pairs of spins, (a, b) and (c, e), 
get the same index, is(a, b) = v(c, e), if and only if the pair (a, b) can be mapped onto 
(c, e) by a symmetry transformation that leaves the system Hamiltonian invariant. 
For the common case of a Hamiltonian that is a sum of identical terms which each 
act on one bond (i. e., connection of lattice sites), this is the symmetry group of the 
lattice. In the case of a square lattice with N — L x L sites on periodic boundary 
conditions (PBC), onlyf 




tThe brackets [-J denote the floor function. 
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phase matrices are needed as can be seen from Fig. 1, and thus, we need only 
K = 0{N) parameters. 

Note also, that v is naturally symmetric, v(a,b) = v(b,a), and that this has to 
be reflected by a like symmetry of $ w. r. t. its upper indices: $^ ir2 = $£ 2ri , which 
must be imposed cxplicitely. 

3.4-2. Full symmetrisation For a symmetric Hamiltonian, it seems natural to reflect 
this symmetry not only in the phases but also in the local, site-dependent properties, 
i.e., in the local unitaries U a and the deformation parameters d^. In case of full 
translation symmetry, one may want to completely drop the dependence of these on 
the site index a. This does indeed reduce the number K of parameters significantly, 
but not as dramatically as in the case of phase symmetrisation. The latter reduced 
the scaling of K from 0(N 2 ) to O(N), while further symmetrisation of the other 
parameters cannot change K = O(N). On the other hand, the time required to 
calculate the energy of a given state is reduced by a factor O(N) in the fully symmetric 
case, as one needs to evaluate it for only one elementary cell of the lattice. 

A good reason not to impose full symmetrisation nevertheless is the observation 
that for many systems, the ground state does not necessarily obey the full symmetry 
of the Hamiltonian due to spontaneous symmetry breaking. Even though in such a 
case, the ground state must be degenerate, and at least one state within the ground 
subspace must obey the full symmetry, this state is unlikely to be the state that 
is easiest to approximate within the chosen class of variational states. To give an 
example: The ground state of the antiferromagnetic Ising chain without transverse 
field is a |0101.. .) + (3 11010 . . .), for any a,/3 with |a| 2 + |/3| 2 = 1. Only for a = f3 
the state is invariant under a translation of one site. However, the state most easily 
approximated is a = 1,0 = (or vice versa), as it is a product state, while any other 
state contains long-range entanglement. If we imposed full translational symmetry 
onto the states, our algorithm would likely fail to find a good state. However, the 
example suggests a compromise between flexibility and low number of parameters: 
We make the local properties U a and periodic in a way that matches the expected 
periodicity of the spontaneously-broken ground state, e.g., in the case of the Ising 
chain, we may use one common unitary and one common deformation vector for 
all odd sites, and another unitary and another deformation vector for all even sites. 
However, our numerical experiments showed that this does not work particularly well: 
the enforcement of such symmetries introduces very many additional local minima 
which trap the minimzation routine much too soon. The intuitive reason for this is 
that enforcing the symmetrie amount to a cut through the energy landscape of the 
parameter space which seems to divide meandering troughs into sepcrated basins. 

Let us nevertheless mention two more possibilities to even further reduce the 
parameter scaling, (i) Wc can make the phase index mapping v such that it does not 
depend on the geometric relation as in Fig. 1 but just on the scalar number of lattice 
steps that separates the spins, the number of phase indices scales linearly only with 
the length L, not with the number of sites N = 0(L V ) (where V is the dimension of 
the system). Together with a full or periodic symmetrization of the local properties, 
we reach a scaling of the number of parameters K = O(L), which allows for a quick 
treatment even of 3D systems of moderate size. The accuracy achieved this way is, 
however, very modest. 

(ii) Often, one may expect long-range entanglement to be suprcssed exponentially 
Then one can choose a distance threshold and fix to zero all phases betweens spins 
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with a distance above this threshold. The threshold will typically be chosen of the 
order of the entanglement length, and as the latter usually docs not increase strongly 
with the system size (except at criticality) one can save considerably on the number 
of parameters. 

4. Evaluating observables 

In order to evaluate an observable O with support on A C V, we need to evaluate 
(0) x = tr Op A 

with 

p A :=tr nA |*(x)) (*(x)|. 

As we shall see now, pa can be calculated in time polynomial in the number N — \A\ 
of spins, the number n of levels per spin and the number m of superpositions, but 
exponential in the number \A\ of spins not traced over. Hence, the expectation value 
(0) x of observables can be calculated efficiently as long as O is a sum of terms with 
small support. 

In particular, we need this algorithm to evaluate the energy (x) | H |W(x)), as 
this is the quantity we wish to minimise. Thus, due to the scaling properties just 
mentioned, we require that the system Hamiltonian can be written as sum of terms 
with small support (as it is the case nearly always). 



4-1- A pair of spins 

To keep notation simple, we only derive the procedure to obtain the two-spin density 
matrix (A = {a, 6}) 

Pab ■= P{aM = tl' V \{aM |*(«)> (*(x)| . (11) 

This is a generalisation of the work done in [DHH+05] for spin-1/2. A further 
generalisation to more than two spins is easy and its result will be given at the end of 
this section. 

The spins that we do not trace over are denoted a and b. We start by inserting 
Eq. (8) into Eq. (11) and pull as much as possible out of the partial trace: 

p ab = nrm (U a ® U b ) Wq> ab x 



m i 



(12) 



x Wl ab (U a ® U b y . 

Here, the operator nrm again means normalisation, now defined as nrmp := p/tip, 
and the inner term p J ab contains anything that cannot be pulled out of the partial 



trace: 



with 



P'ab = tr V\{aM 



r ab ){r a b 



(13) 



n 

ceV\{a,b} 



c£V\{a,b} s£S 
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which is, due to Eqs. (5) and (7), 




K ceV\{a,b} 

Note that in the trace (13) all the phase gates W$ oe with c, e ^ {a, b} cancel with their 
Hermitian conjugate, as do all the local unitaries U c , c ^ {a, b}. Hence, p a b depends 
only on a subset of the parameters. 

In order to take the trace in Eq. (13), we have to sum over all states |q), 
q G S w ~ 2 , where the underline denotes that the components of q arc not indexed 
(si, S2, ■ ■ ■ , SN-2) but rather using the elements of V\{a, b} as indices. We get 



tib = Z \3 1 ^ 

q es"- 2 



E 



E 



i 



oc ac DC 



i ( KT + *2* - KT - Kt 



q 6 E™- 2 s,s'£S N 

X II d cs c dcs* CX P 

cev\K&} 

= E E II d k<4>P 

r,r'6S 2 qSS™- 2 cGV\{a,t} 

In the last line, we can exchange sum and product in the following manner without 
changing the expression: 

e n - n e 

qGS™- 2 ceV\{a.b} ceV\{a.b} <J C 6S 

This gives 



Fab 



E 

r,r'GS 2 



n 

c£V\{a,b} qeS 



E d U d % cx p 



ac 



be 



(14) 



The sum over S has n terms, and N — 2 such sums are multiplied. Hence, in 
order to calculate one matrix element of we have to evaluate the underbraced 
term (N — 2)n times. This is the origin of the promised polynomial scaling for the 
calculation of expectation values. 

Recall that pl b is an n 2 x n 2 matrix. We will make this more explicit by writing 
the product as Hadamard product. The Hadamard product, denoted (3, is defined 
as the component-wise multiplication of matrices, (B C)y := B^Cij. Its identity, 
denoted 1 Q , is the matrix having 1 as all of its elements. Using this, we can rewrite 
the previous equation in a very compact form: 

O pit (is) 

c£V\{a,b} 

where the matrix elements of p? , are given by the underbraced term in Eq. (14). Each 
factor of the Hadamard product can be understood as resulting from the interaction 
of the spins a and b with one spin c from V\{a, b}. These factors can be calculated 
seperately because the interaction between two different spins in V\{a, b} may be and 



Pab 
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is ignored due to the cancellation of all phase gates within V\{a, b}. (Cf. the remark 
after Eq. (13).) 

To make this more concrete, let us look at the simple case of n = 2. Then (Recall 
for s = or t = 0.) 



that d 3 afi = 1, and $ s * 

Fab 



O (l© + 4d k c *p abt , 

c£V\{a,b} 



with 



Pab,< 



be 

1.1 ,,,,1,1, 



e i<E, 6 



i(-^e 1 +*L 1 ) 
i*l.l 

e be 



eK*^ 1 -*^ 1 ) 
1 



\ e H ^ be i e be e 
which is the formula given in [DHH+05]. 



(16) 



4-2. Several spins 

For reference, we give the result for density matrices for not simply two spins a, b, but 
arbitrary numbers of spins, given in a set A: 

PA :=1avvt|*(x)) (tf(x)| 



with 




The mapping r : A — ► {1, . . . , |A|} gives here the index that spin a £ A gets within 
the density matrix pa (i. e., in the 2-spin case of p a b, r(a) = 1 and r(b) = 2.). 
It is also useful to observe that 




with 

d a)= e i r >rKr TW ( 2 °) 
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This formula comes from the observation that a matrix product of a diagonal matrix, 
an arbitrary matrix, and another diagonal matrix can be written as Hadamard 
product: 



5. Demonstration for two models 

To approximate a ground state, we have to vary the parameters in order to minimize 
the energy. Before we explain our techniques to achieve this we show the results of 
such minimizations for two different models to demonstrate the performance of our 
technique. The two model systems, namely the XY model and the Bose-Hubbard 
model, are presented in the two following subsections. For each of the two models, we 
have used a different implementation (see Sec. 6 for details) and different heuristics for 
the global minimisation. Hence, we shall use these results as examples when explaining 
these heuristics in Sec. 6. As the second implementation is newer and its heuristics 
more sophisticated, the results for the Bose-Hubbard model are more convincing. 
Nevertheless, we also present our results for the XY model, as the old heuristics 
provides illuminating insights into important aspects of our methods behaviour. The 
examples with the XY model are a continuation of the examples for the Ising model 
(wich is a special case of the XY model) already given in [APD+06]. 

5.1. The XY model with transverse field 

The XY model with transverse field for a system of spin-1/2 particles on a lattice is 
given by the Hamiltonian 



where a x ,y,z are the Pauli matrices, B is the set of nearest neighbours, B the transverse 
field and 7 is called the asymmetry. For 7 = 0, we get as special case the XX model, 
and for 7 = 1, we get the Ising model. 

5.1.1. One dimension (spin rings) For ID, the XY model with transverse field can 
be diagonalised using a Jordan- Wigner and then a Bogoliubov transformation (the 
latter is trivial for 7=1). Correlations have been studied in early work in [Pfe69] 
(Ising) and [Kat62, BM71] (XY). The latter article also gives the phase diagram of the 
ID XY system (reproduced in Fig. 2a). The entanglement properties of these phases 
and their transitions have recently found much interest. The behaviour first indicated 
by numerical studies [VLRK03, LRV04] was soon confirmed by analytic calculations 
[JK04, Pes04, IJK05]. 

Our technique seems to be suited to study this model: the results are quite precise. 
Fig. 3 shows a transition through the Ising critical point. The curves show the XX 
correlations for different spin-spin distances in a ring of TV = 16 spins. 




(21) 



Further, for the numerics, one may use 
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Figure 2. Phase diagram of XY model, (a) for ID (according to [BM71]), (b) 
for 2D (according to [Hen84]). 

As our technique tends to spontaneously break symmetry where the true ground 
state does not, it makes sense to plot the two-points correlations! for many different 
values of the parameters of the Hamiltonian (here: B and 7) in order to spot phase 
transitions. We find that it works better to plot correlations for a specific distance 
than to estimate correlation lengths from the data because the system is still so small 
that the exponential decay of correlations is masked by boundary effects . Fig. 4 shows 
such a plot for the ID XY model. As is to be expected, one sees that near critical 
regions correlations are much stronger. (For the infinite chain, the critical regions are: 
XX criticality at 7 = for < B < 1 and XY criticality} for B = 1 [BM71].) The 
spread of the areas of high correlation around the critical regions of the infinite chain 
looks similar areas of high entropy identified in [LLRV05] - compare with Fig. 3 in 
that article (and note that there, entropy is small around 7 = 0, B = 1 despite the 
critical nature of this point - a feature also seen in our plot of correlations.) Had we 
not known the critical regions, it is not merited to conclude that the system is critical 
where the correlations are strong, as the system size and the correlation distance is 
surely to small for this. We rather suggest to use a plot of this kind for a first look 
at a yet unstudied Hamiltonian. Regions of high correlations may suggest points in 
parameter space for which numerical calculations for different system sizes may give 
interesting results. 

Another interesting feature of the ID XY model is the Baruch-McCoy circle, 
which is the defined by B 2 + 7 2 = 1 . On this circle, the ground state has product 
form [BM71]. Our approach accurately reproduces the vanishing of all correlations as 
one approaches a point on the Barouch-McCoy line (Fig. 5). 

5.1.2. Two dimensions The 2D XY model with transverse field has been studied in 
[Hen84]. The main result of the latter treatment is illustrated by Fig. 2b. 

fWe either plot the XX correlations or the maximum singular value of the correlation matrix 

\ J ■> / i,j=x ,y ,z 

^Strictly speaking the model is XY critical only for B = 1, 8 ^ 1, and Ising critical for B = 5 = 1. 
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Ising spin ring, N = 16 
1 r- 1 1 1 




field 

Figure 3. Approximation of the ground state of an Ising ring with 16 spins, 
calculated for m = 4, i.e., the phase gates act on superpositions of 4 product 
states: The orange triangles shows the deviation of the variational energy from 
the true ground state energy as obtained from the exact solution. The other 
symbols show the mean value of XX correlations for different spin-spin distances. 
As a guide to the eye, symbols for the same distance are conmnected by lines. It is 
evident that one seems to need two lines to connect the set of data points for each 
distance, as a single line would "jump" in zig-zag near the critical point. In other 
words: There seem to be two basins of attraction for the minimzer, corresponding 
to the B < 1 and B > 1 phase, and near the critical point, the minimizer may 
either fall into one basin or the other. The use of the sweeping technique, to be 
discussed in Sec. 6.3, allows to get around this undesirable behavior and direct 
the minimizer to minima with smaller basins of attraction which better resemble 
the true ground state in the area of influence of the quantum phase transition. 
Hence, this plot is not meant to show a good result, but rather illustrate the kind 
of failure that motivates and nessecitates the sweeping technique. 



In order to demonstrate our scheme in a 2D setting, we have done calculations 
for a torus (i. e., a square with periodic boundary conditions) of 6 x 6 spins. We fixed 
the asymmetry at 7 = 0.65 and varied the field strength B from to 4.5 in order to 
cross both of the phase transitions indicated in Fig. 2b. The results, shown in Fig. 
6, show prominent kinks at the expected positions of the phase transitions, and the 
correlations fall off in a roughly exponential manner with distance as expected. We 
still see additional jumps due to convergence into wrong basins, and this prompted 
us to seek a means to avoid this, namely the sweeping technique. The plots in the 
following section have been obtained this way and hence do not show such strong 
jumps. 
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Figure 4. Correlations (maximum SV of correlation matrix for distance 6) of 
the ID XY model, calculated for a ring of 16 spins, and with m = 3. The brown 
crosses show the positions of the data points, the colour surface in between is 
interpolated using Sibson's method (see Sec. Appendix A. 5 and be careful to 
not be mislead by artefact of the interpolation, such as appareant features at 
regions with too sparse data points). The correlations appear at that regions that 
also have high entanglement in the thermodynamic limit. (Compare with Fig. 
3 in [LLRV05].) However, this agreement is, unfortunately, only qualitative: A 
comparion with the exact result for the considered finite size case, shown in the 
small plot to the right, reveals that correlations are over-estimated significantly. 



0.016 




Figure 5. Correlations (maximum SV of correlation matrix) in the vicinity of 
the Baruch-McCoy circle. Along this circle, which is defined by B 2 + 7 2 = 1, the 
ground state of the XY model has product form. This is nicely reproduced by our 
numerics: At B = S = l/v2, all correlations vanish. To show this, we here plot 
the correlations for spin-spin distances 1 (red), 2 (green), and 3 (blue) for a cut 
along the line B = <5, i.e., radially through the circle. The x axis is the value of 
B = 5. Calculated for a ring of 16 spins and m = 3 as in Fig. 4. The solid, dark 
lines are results from the variation, the dashed, light lines are exact values. 
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XY, 6x6, PBC, delta=0.65 




0.5 1 1.5 2 2.5 3 3.5 4 4.5 

field 



Figure 6. Correlations (maximum SV of correlation matrix) for the 2D XY 
model, calculated for a torus of 6 X 6 along a cross section through the phase 
plane of Fig. 2b along the line 7 = 0.65 (number of superposed states: m = 3). 
The red arrows show the positions of the two phase boundaries for the infinite case 
(according to [Hen84], cf. Fig. 2b.) As this plot has been produced without use 
of the sweeping technique, some instances of convergence to the wrong minimum 
are evident from the jumps at B > 3. (Compare with the discussion at Fig. 3). 
The different curves show the correlation for spin pairs with distance (d x ,d y ) in 
x and y direction. 

5.2. Bose-Hubbard model 

The Bose-Hubbard model is defined for a system of harmonic oscillators, arranged in 
a lattice, and is described by the Hamiltonian 

H = -J J2 (^A + H.c) + U »a - 1) /2 - n n a (22) 

{a,b}EB o£V a£V 

As before, V is the set of all lattice sites, and B the set of all unordered pairs of 
nearest neighbour. The operators b\ and b a denote the ladder operators to create 
and annihilate a bosonic excitation of the oscillator at site a, and h a — b a b a is the 
number operator. The first term, called the hopping term describes the "hopping" of 
an excitation from a site a to a neighbouring site b, a process which occurs with the 
hopping strength J. The second term describes the repulsion between several bosons 
on the same site. To fix our energy scale, we set the repulsion U to 1 in the following, 
i.e., all dimcnsionlcss energies are to be understood in units of U .\ The last term is 
relevant if the particle number is not fixed, which it is in fact not in our case. Then, 
assigning a value to the chemical potential fi allows to choose the mean density of the 
ground state. 

The Bose-Hubbard Hamiltonian is of interest due to its rich phase diagram, first 
exposed in [FWGF89]. While its original motivation was the description of certain 
structured solid state systems such as arrays of Joscphson junctions, interest in the 

fWhen comparing with other literature, care has to be taken that many authors use the alternative 
convention to set J = 1. Also, J is often denoted t. 
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0.01 0.02 0.03 0.04 0.05 0.06 0.07 

hopping strength J 



Figure 7. Mean compressibility k of a 4 X 4 lattice (with PBC) of Bose-Hubbard 
sites as function of the Hamiltonian parameters J and fi. One can clearly see 
the Mott insulator lobes — characterised by low values (0 in the infinite case) of 
k — for densities p = 1,2,3. and the surrounding superfluid phase. The crosses 
in cyan mark the parameter points for which a calculation was performed. To 
depict the values, an surface was interpolated between these points and is used 
to colour the plot. Unfortunatly, we could not fully get rid of artefacts due to 
the interpolation (see Appendix A. 5), and at those regions where the density of 
data points varies the contour lines become distorted. The resulting "wobbling" 
at regions with sparse data is hence not genuine and should vanish if one adds 
more data points. As cuts through the plane do not suffer from these presentation 
problems, we have calculated much more points for three fixed values of J and 
show these cuts in Fig. 8. 

system increased significantly with the discovery that it can be realized with cold atoms 
in optical lattices [JBC + 98] and with the spectacular experimental demonstration of 
this fact [GME+02], where a transition from the Mott insulator phase to the superfluid 
phase and back was observed. (For a review, see [LSA+07]). 

In order to simulate a bosonic system with our ansatz, we restrict the number of 
occupations at each site. For all the following calculations, we set the dimension of 
each site to n = 5, i. e., the maximum occupation per site it n — 1 = 4. The creation 
operator w is defined such that b' \n — 1) = in order to truncate the Hilbert space. 

A good way to distinguish the Mott insulator from the superfluid phase it to look 
at the mean compressibility 
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Figure 8. As evident from the crosses in Fig. 7, we have many data point for 
J = 0.02, 0.04, 0.062, which allow us to plot vertical cuts through the plane of 
Fig. 7. Here, we show the values of the average occupation number per site p 
(a) and the compressibility k (b) for the mentioned three values of the hopping 
strengths J. The curve for J = 0.02 looks smoothest because these points have 
been calculated to higher accuracy (cf. Fig. 10). 




0.2 0.22 0.24 0.26 0.28 0.3 0.32 0.34 
density p 

Figure 9. For values of the density p corresponding to very low (and integer) total 
particle numbers, we can obtain exact values for energy E and compressibility k 
from diagonalisation of the full Hamiltonian. This shows that our method has 
good accuracy. (Note that the plot corresponds to a very small section of the steep 
left-most flank in Fig. 8b.) For the J = 0.02 curve, which has been calculated 
to especially high accuracy, the exact values for pN = 4,5 (i.e., as N = 4 2 : 
p = 0.25, 0.325) coincide with the approximated and interpolated values with an 
absolute deviation of only 10 — 4 . Even for the values for J = 0.04, which have 
been obtained with much fewer sweeping steps, the accuracy is below 2 ■ 10 — 3 . 

which is strongly suppressed in the Mott insulator phase. Our results shows the 
form of the phase diagram in impressive clarity (Fig. 7). Although each data point 
only required a rather quick and rough calculation, one gets a good overview of the 
ground state properties in dependence of the Hamiltonian parameters J and fi. To 
better show the quantitative features, we have also plotted vertical cuts through the 
(J, n) plane (Fig. 8). Especially for the points at J = 0.02, the zigzag sweeping 
technique (described later in Sec. 6.3) was used to improve accuracy by more than an 
order of magnitude. This can also be seen from Fig. 9: In this plot, we compare the 
observable k, calculated for our approximand states, with exact values. To allow for 
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c 
to 
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Figure 10. Non-local properties of the ground state are much harder to 
obtain than local ones. In order to see whether our technique is also capable 
of yielding good results here, we calculated (for J = 0.02 and varying values 
of p for the 4x4 sites Bose-Hubbard system) the density-density correlation 
7 := A ^2 a £v — P 2 wnere a ' denotes the site that is one or two lattice 

step(s) to the right (denoted with (1,0) and (2,0). (Correlations of this kind 
have, incidentally, been studied recently in [PRB06].) The connected points 
show the results obtained with our approximation, the isolated points have been 
calculated using the worm code (quantum Monte Carlo technique) [TAH98] of 
the ALPS project [ADG+05] (at finite, but low temperature). The agreement 
is qualitatively ok but quantitatively not too precise. (Actually, the precision of 
two-point correlation is unfortunatly insufficient to obtain a good picture of the 
momentum distribution from their Fourier transformation.) 



this comparison, we have included values from exact Lanczos diagonalization. We are 
grateful to G. Pupillo, who supplied these numbers to us. He used a program, written 
for another project and using Arpack [LMSY96], that allows to diagonalize a small 
Bose-Hubbard system exactly if the number of particles is small as well. For a 4 x 4 
system, up to approx. 6 particles in the 16 sites can be treated. This corresponds 
to the very beginning of the plots of Fig. 8, which we have magnified in Fig. 9. The 
accuracy of 10~ 4 for the compressibility is competing well with the precision attainable 
with quantum Monte Carlo techniques. 

While the compressibility is a local observable, the more challenging task is to 
study non-local obscrvablcs such as density-density correlations of the form 



a£V 



\aev 



where a' is the site which has a fixed position relative to a, i.e. v(a,a') = v(a). In 
Fig. 10, we attempt this task for a 4 x 4 lattice. Fig. 11 shows calculations for larger 
systems, up to 32 x 32 sites. While the noise present in the latter plot is small on 
an absolut scale (note that the plot zooms in to a quite small parameter region) is is 
unfortunately still too large to prevent us from doing finite-size scaling. 
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Figure 11. Ground state approximation for Bosc-Hubbard systems with up to 
32 X 32 sites. The figure shows the compressibility re as function of the chemical 
potential n at the phase transition between the n = 1 Mott lobe and the supcrfluid 
phase above of it (for a coupling of J = 0.02{7). As one can see, the numerics cope 
with the large amount of sites but fails to converge with a precision sufficient to 
make out clear finite-size scaling trends. The reason seems to be that the number 
of local minima increases very strongly with system size: While the use of sweeping 
technique allowed to get a smooth curve for the 4x4 case, our attempts on the 
larger systems failed to get smoother than shown here. (We put considerably 
more effort in the data for fi < .8345, but even there, the curves are still very 
noisy. ) 



6. Performing the minimisation 

Usually, the Hamiltonian of a spin system is given in the form of a sum of terms each 
of which has support on only a small number of spins - one or two in most physical 
cases. When the terms acting on single spins are absorbed into those acting on two 
spins, such a Hamiltonian can be written as 



where B is the set of all pairs of spins, on which a term acts jointly. These pairs are 
called bonds in the following, and they typically (but not necessarily) form a regular 
lattice. The bond Hamiltonians H a b may all be equal or not, and only in the former 
case, the simplifications of Sec. 3.4.1 can be used. 

The minimisation problem Eq. (1) that we have to solve then takes the form 



Finding a minimum of a general function of many parameters is a thoroughly 
researched but intrinsically hard problem. Our approach is described in the following. 
As we do not assume the reader's familiarity with numerical optimisation, we will 
explain some textbook knowledge. 




H. 



ab ' 



(a,b)eB 
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6.1. Local search 

Given a starting point x G R K in parameter space, the problem of local search or 
local minimisation is the task of finding a local minimum in the vicinity of Xo. An 
exhaustive treatment of this topic can be found in the standard textbook [NW99] 
which covers all of the algorithms mentioned in the following in detail. In our case, 
we have to deal with unconstrained (i.e., all values of the unbounded space R K are 
admitted) nonlinear (i. e., the energy function -E(x) does not have any simple structure 
that would allow the use of a more powerful, specialised algorithm) local minimisation. 
Algorithms for this case come in two classes: So-called direct methods only require a 
means to evaluate the function at any given point, while gradient-based methods also 
require a means to obtain the gradient V x -E(x) at any given point. 

Direct methods are convenient, but comparably slow. For very small systems 
(chains of up to 6 spin-1/2 sites, corresponding to less than 100 parameters), we could 
achieve convergence with direct methods, using the two most common ones, Nelder- 
Mead [NM64] and Powell [Pow64] minimisation, with Powell minimization converging 
faster. 

For any meaningful system size, however, direct methods are much too slow. 
Hence, we coded routines to obtain the derivatives of E w. r. t. all kinds of parameters.! 
This required rather tedious calculation and coding, and the formulae and their 
derivation are summarised in Appendix B. 

Using the gradient functions, we tried the standard minimisation methods 
the literature offers, namely the Fletcher-Reeves conjugate-gradient method, the 
Polar-Ribierc conjugate-gradient method and the Broyden-Flctcher-Goldfarb-Shanno 
(BFGS) method. We started using the implementations provided by the GNU 
Scientific Library [GTJ+03], which, however, turned out to be not robust enough. 
Nevertheless, it could be established that convergence speed for our problem is as 
usually expected, i. e. Polar-Ribiere (the oldest of the algorithms, from 1964) performs 
worst and BFGS (the newest, from the 1970s) performs best. 

BFGS is a so-called quasi-Newton or Davidon algorithm. This means, it uses the 
gradients obtained at the points visited so far to build up an estimate to the Hessian 
of the function (assuming that the Hessian varies only slowly). The approximation 
to the Hessian (or more precisely, to the inverse of the Hessian) is then used to make 
a good guess for the next step. As the approximant is, like the Hessian, a K x K 
matrix, updating it at each step requires 0(K 2 ) steps, which scales worse than the 
calculation of E(x), and hence, maintaining the BFGS data becomes more expensive 
than evaluating the function. 

The textbook solution to this problem is to use the "limited memory" variantj of 
BFGS, which is known as L-BFGS and stores only a list of the last, say 25, gradients, 
and uses this data to produce a Hessian approximant "on the fly" [BLN95] . We have 
used the L-BFGS-B Fortran code [ZBLN97], which is very robust, not the least due 
to the excellent line search routine [MT94] that it uses. 

A problem is the stop condition, which decides when convergence is assumed. 
We have tried several approaches: watching the norm of the gradient, the size of the 
steps, and the difference of the function value per step; these either taken point-wise, 

fNote that it is not helpful to obtain the gradient by numerical differentiation, as this is hardly 
faster than using a direct method. 

|The term "limited memory" shows that the problem of keeping the full matrix was then, in the 
f980s, not so much seen in the time it takes to update the matrix but rather simple in the fact that 
a large matrix might not fit into the memory of the computer. 
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or averaged over the last, say 30, or 100, steps, or taken the maximum from the last 
30-or-so steps. All this could not clearly predict convergence, as there seem to be 
long shallow slides, which tempt one to stop minimisation prematurely. In the end, 
we found that waiting until progress gets below machine precision is most viable. 
However, in the sweeping technique, described later, the minimzation can be stopped 
once it seems advantageous to first continue with a neighbor. 

The described technique only allows to find local minima. How do we find a 
good local minimum, or even the global one? Although the literature discusses many 
different heuristics and algorithms, it is a far from trivial task to find a good scheme. 
Wc have developed and tested two different heuristics, which shall be explained in the 
following two subsections. Both heuristics are two-phase methods [Sch02], i.e., they 
combine a global driving scheme, that chooses points to start a local search from, with 
the local-search algorithm just discussed. 

In both cases the minimization for a specific tuple of Hamiltonian parameters 
is typically performed several times. Whenever a new energy value is found which is 
lower than all values that have been found so far for this parameter tuple, the previous 
energy value and corresponding state is replaced by the new one. Hence, the longer 
one performs these heuristics the nearer one gets to the true ground state (or more 
precisely: to the lowest lying state within the variational class) . We emphasize that we 
never discard a data point unless it has been "underbid" by a new calculation. This 
makes the result objective even if subjective judgement has been used in carrying out 
the heuristics. 

6. 2. Multi-start and step-wise adding of superpositions 

For the calculation of the results for the XY model (presented in Sec. 5.1), we tried 
several different heuristics in order to "move around" local minima. We finally settled 
for the "multi-start" scheme described now, which turned out to work best, at least 
for the examples we studied: We started by choosing the parameters Xo for a state 
|\l/(xo)) with m = I (i. e., without superpositions) uniformly at random from [—5; 5] 
and the used the L-BFGS algorithm (as explained in Sec. 6.1) to go downhill from 
there towards a minimum. We allowed this minimisation only to run for a limited 
number of function evaluations (typically a few hundred, or up to 1000), and then 
restarted with another randomly chosen initial point. Having done a number (say 15) 
of such "trial runs" , the one that reached the lowest energy within the limited number 
of steps is kept, the other data is discarded. The best run is now allowed to continue for 
a significantly longer time, until the maximum number of "main run" steps (typically, 
several thousand function evaluation) is exhausted or the energy change falls below 
machine precision. Then, we increased the number m of superpositions by one. This 
makes the parameter vector x longer, i. e. 2N + 2 real numbers have to be added 
(for N complex deformation parameters and one complex superposition coefficient, 
cf. Eq. (10) for n = 2). These are again chosen at random, but without changing 
the parameter values that have already been found. (It also helps to choose the new 
value for a m with small modulus, such that the new parameters do not let the state 
stray to far from the already established good state.) Again, a number of trial runs 
is started, with different random numbers to extend the parameter vector, and the 
best one is allowed to continue for many more steps in the main run phase. This 
iterative extending of parameter values was looped until m reached a certain value. 
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Figure 12. Using derivatives to locally judge the quality of the approximant. 
Red (left) dashed vertical line: bad point, green (right) dashed vertical line: good 
point. For explanation see main text (Sec. 6.3). 

This values does not have to be very large: for the results presented in Sec. 5.1, m = 3 
was sufficient. 

A disadvantage of this heuristics is evident in Fig. 3: Some points are much worse 
than their neighbours. For example, the point B = 1.09 shows a sharp peak towards 
worse accuracy (orange line), while its neighbours to both sides are better. What 
happened is that near the phase transition, the two phases compete to govern the 
ground state, and once the minimiser gets trapped by the catch-basin of one of the 
two phases it cannot switch to the other one. In most cases, the multi-start scheme 
will allow us to enter the main run within the catch-basin of the correct phase. If, 
however, the minimum energy of the two competing phases are very close, they cannot 
be distinguished during the short and rough trial runs, and it depends on mere chance 
to which phase we converge. The obvious solution is to use the value of neighbours 
which seem to have converged to lower energies as starting points in order to see if 
this allows to get to lower energies. This is the strategy that we tried next. 

6.3. Zigzag sweeping 

All the results for the Bose-Hubbard model (as presented in Sec. 5.2) have been 
obtained without the use superpositions to the right of the phase gates, i.e., with 
m = 1. Accuracy was instead improved in an iterative way using the following 
heuristics: Start minimisations from parameter vectors chosen at random for a variety 
of different values of Hamiltonian parameters (i.e., chemical potential fi and on-site 
repulsion J, for the Bose-Hubbard Hamiltonian) within the area of interest. Once the 
minimisations have converged more or less, compare each point with its neighbours. 
If one point looks better than a neighbouring, use this point's parameter vector to 
start a minimisation for the neighbouring point's Hamiltonian parameters. 

In order to see how to do this in an objective way, look at the example of the 
red curve of Fig. 8. There, J = 0.02 is kept fixed and /j, varies from -0.08 to 2.7. The 
data points are spaced rather closely (/z varies in steps of 0.003 up to 0.15). Hence, if 
we plot the energy E versus the chemical potential [i and zoom in to look only at a 
few adjacent data points, we may expect to see simply a straight line. If the points lie 
close enough, any deviation from linearity is less likely for physical reasons but rather 
due to different quality (i. c., proximity to the global minimum) of the approximation 
at the points. Hence, we can interpret slight deviations towards higher [lower] energy 
as an indication that the point is a worse [better] approximant than its neighbours. 
As the slope varies too little to clearly see these differences it is helpful to take the 
second numerical derivative to enhance the differences. Following the sketch in Fig. 
12, a simple heuristic emerges: A pronounced peak in the second derivative means 
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that the corresponding point is a better approximant than its neighbours. Hence, use 
its parameter vector as initial values to redo the minimisation at the neighbouring 
Hamiltonian parameters. If one of the two neighbours has a much lower second 
derivative than the other, redo only this one. Conversely, a point with a pronounced 
dip in the second derivatives should be re-done, starting with the parameter vector of 
one of its neighbours, normally the one with the higher second derivative. 

Close to a phase transition, the procedure may get stuck because the step from 
one neighbour to the next changes the state too much. In this case, one should insert 
a new data points between the point that failed to get better and the neighbouring 
point used for the initial value. 

6.4- Outlook to other minimization techniques 

The literature on unconstrained nonlinear minimization is vast, and finding a good 
global minimization scheme requires a lot of trial and error. Apart from the two- 
phase heuristics described above, we have also tried genuine global minimization 
techniques, namely simulated annealing [KGV83] and differential evolution [SP97]. 
Both are genuinly global in the sense that they do not employ a local search stage. 
However, they thus cannot take advantage of the possibility to calculate the gradient. 
Hence, it is not surprising that simulated annealing converged much too slowly to 
be of use. (Simulated annealing is used in many different fields with much success 
but usually for functions with a convoluted potential surface but only few variables. 
We have several hundreds or even thousands of variables.) Differential evolution is 
a genetic algorithm and shows the — on first sight surprising — feature of converging 
to the mean field solution. (This seems explicable from the fact that crossing two 
genotypes in different basins has to end up in a "compromise", which is mean field.) 

One further possibility might be basin hopping, which is a family of techniques 
(reviewed in [WS99]) that combine simulated annealing with a local search phase in 
order to overcome the problem states in the previous paragraph. These ideas are 
quite recent and research is still ongoing. So far, however, it seems that the basin 
hopping requires to perform very many local searches which hence have to converge 
fast. This is unfortunately not so in our case. It seems conceivable that variants can 
be developped that only use rough and hence fast local searches, and this might be a 
way to proceed with our method. 

Another ansatz is using a clustering stage in the global phase of a two-phase 
method [RT87]. This allows to make multi-start much more efficient but has two 
difficult requirements: (i) One needs to factor any degenerecies in the minima out of 
the parameter space. We have not yet studied whether this is possible, (ii) The number 
of local minima must be small enough that one has a decent chance to encounter all 
of them during the local searches. Unfortunately, especially the calculations for Fig. 
11 have brought us to the observation that the number of minima seems to grow very 
fast with the system size. 

A further technique that we have tried is imaginary time evolution, which works 
as follows. Given an initial state |^(xo)) chosen at random, we can find an estimate 
|\If(xj+i)) ~ nrme - H |$(xj+x)) for the discretized evolution of the state under the 
system Hamiltonian in the imaginary time direction. As for most initial states \^o), 
\&g = hmt^oo nrme - is the ground state, this iterative evolution should converge 
to a good approximation of the ground state. We have decomposed e~ AtH into a 
product of bond terms e~ AtHab using standard Trotter decomposition and then tried 
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to find the Ax G R K that maximizes the overlap 

(f(x + Ax) | e~ AtH -» | *(x)) 

v/(*(x + Ax) | *(x + Ax)) (tf (x) | e - 2Atff -" | #(x)) ' 

Unfortunatly, the maximization failed to give good result even for arbitrarily small 
time steps At and we thus abandoned this approach. 

We should also mention that for the some of the results of or previous paper 
[APD+06], we (actually, M. Plenio, who programmed this part) have used a Raylcigh 
minimization technique: One restricts the energy function E(x) in the sense that 
one keeps all but a few parameters fixed. For certain such subsets of only a few 
parameters, namely for the set of parameters corresponding to a single local unitary 
or to the phases and deformations for one pair of qubits, one can write the restricted 
energy function as quotient of two quadratic forms. This is also known as a generalized 
Rayleigh quotient and the global minimum can be find via a generalized eigenvalue 
problem. Such a "global minimum" typically is, however, not even a local minimum 
of the full energy function. The reason that we got good result for the Ising model in 
[APD+06] seems now, in retrospect, have been due to the extraordinarily benign form 
of the corresponding energy landscape. Hence and because the scheme cannot easily 
generalized to spins higher than 1/2, we did not persue this any further. 

7. Conclusion and outlook 

To conclude, we have presented a class of variational states that holds promise to 
approximate the ground states of spin systems and bosonic systems. The advantageous 
properties of this class is that it includes states with an arbitrarily high entanglement 
and the possibility to adapt to arbitrary geometries and number of spatial dimensions. 
Wc have shown how to calculate expectation values of observables for these states and 
demonstrated the approximation of the ground state for two model systems, namely 
the XY spin-1/2 model and the Bose-Hubbard model, in one and two dimensions. 
Furthermore, we have explained heuristics suitable to drive the minimization. 

The method works for small systems and maps out the rough structure of 
phase diagrams. (The system sizes, though small, were sufficient to see the phase 
boundaries even though phase boundaries are defined, strictly speaking, only in 
the thermodynamical limit.) We can calculate observables for states in systems of 
considerable size but have problems in approximating the ground state in larger system 
to precision sufficient to see actual differences between different system sizes and hence 
to do finite-size scaling studies. 

It seems likely that this is not because there were no states in our variational class 
which were close enough to approximate such ground states well. Rather, wc simply 
cannot find them because our minimzation gets trapped in local minima. Can the 
avoid this? This is the crucial question for the future development of the scheme, and 
at this moment, we may only offer some thoughts on that: It seems unlikely that the 
choice of another generic global minimzation algorithm is able to steer around these 
local minima better that those algorithms that we have tried. For further progress, 
it seems hence desirable to have a better understanding of the shape and structure 
of the manifold C H, i.e., our variational set of states as described by the 

mapping from the parameter space. Is, for example, this manifold "folded" more and 
curved stronger than the cqui-cnergy surfaces of typical system Hamiltonians? This 
might explain, why there are so many local minima — and getting a better grasp on 
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topology and metric of the mapping and its image could be most helpful in finding 
a better way to steer towards good minima. 
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Appendix A. Notes on the implementation 

Appendix A.l. Avoiding overflows 

A certain detail is worth mentioning as it may cause some difficulty in the 
implementation: As the product (15) contains 0(A) terms, its values grows 
exponential with the system size N. Even for factors which are quite close to 1, 
the value will leave the range of floating-point arithmetics (on most computers, ca. 
10-308 _ _ _ 10 308 ) for even moderate values of N . To avoid this, one has to compute 
the product by summing up the logarithms of the matrix elements of p? ab c , then 
subtracting a constant from this sum, and then exponentiating the result component- 
wise. The subtraction of the constant does not change the final result, as it formally 
cancels against the final normalisation to unit trace. The exact value of the constant is 
hence irrelevant, but it has to be chosen large enough to avoid a floating-point overflow 
during exponentiation, but not so large that the elements of all the matrices p 3 ^ vanish 
due to floating-point underflows. (That some elements of some of the matrices pj^ 
suffer an underflow is, however, unavoidable, but harmless, as their contribution to 
the result is evidently insignificant.) Especially for large systems, the constant has to 
be readjusted during the minimization. 

Appendix A. 2. Choice of programming languages 

We have written two implementations of our algorithm. The first one, called "ewgs" 
is specialised for spin-1/2. It was used for the results on the XY model (Sec. 5.1), and 
also for the results presented in [APD + 06].f The other, more recent program is called 
"hwgs" and may be used for spins of any size n. "ewgs" is mainly written in CH — h, 
only the outer drivers are written in Python. Python [R + ] is a very modern, quite 
powerful scripting language, that features high-performance just-in-time compilation, 
an exceptionally comprehensive low- and high-level library, an open-source license and 
excellent inter-platform portability. The development of a numerics library for Python 
has reached maturity quite recently with the release of NumPy [Num, Oli06] . Due to 
the higher level of the language, development is much faster in Python than in CH — h 
This makes it advisable to do most of the coding in Python and only write the "hot 

fFor completeness, we should point out one difference between the description in this article and 
the implementation: In "ewgs", the unitaries are not parametrised using the Cayley transform, but 
rather as linear combination of the identity and the Pauli matrices: U = — — "> ltTx >> U2 f v , ) " 3 ' Tz . 
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Figure Al. Performance of our "hwgs" implementation: For a Bose-Hubbard 
system on a square lattice of varying size, the calculation time for a single reduced 
density matrix of a pair of sites (red diamonds) and for the full gradient of 
the energy (derivatives w. r. t. all parameters) (blue squares) is shown. These 
calculations have been done for states with n = 5 levels per site and no 
superpositions (m = 1). The program was run on AMD Opteron machines clocked 
at 2.2 GHz. 

spots", i. e., the proverbial 10% of the code in which the processor spends 90% of the 
time, in an optimizing compiled language such as C++. This approach, though it 
may sound unusual to a traditionally oriented computer physicist, has been used in 
several places with much success (see e.g. the advocacy in [BCG05]), and from our 
experiences, we clearly recommend its use. Hence, for our second implementation, 
"hwgs", we followed this paradigm consequently and wrote only a small part in CH — h 
This part was bound to the main Python code using SWIG [B + ]. For the local 
minimizcr we used in both implementations the L-BFGS-B Fortran code [ZBLN97], 
linked to Python with the help of the tool f2py [Pet]. 

Appendix A. 3. Performance 

The performance of the "hwgs" implementation can be seen in Fig. Al. The blue 
curves shows the time required to calculate energy and full gradient for one parameter 
vector at various system sizes. In order to see the time required to find a good 
approximand, this has to be multiplied with the number of function evaluations needed 
by the minimiser. 

Usually, one wants to find approximands for several different values of the 
Hamiltonian parameters. Then, one can save much time by running these minimisation 
in parallel if one has access to a computer cluster. 

Appendix A. 4- Availability 

We would welcome to see our code been used in further projects. Hence, researchers 
who are interested in applying our code in their own projects are encouraged to contact 
the authors. 
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Appendix A. 5. Density plots 

The plotting technique used to obtain Figs. 4 and 7 merits a brief explanation. For 
these plots, we calculated the plotted quantity at different value pairs for the quantities 
at the x and y axes. In order to work out interesting feature, we did not evaluate 
at a fixed grid but rather started with some losely spaced points to get an overview 
and then added more and more points at regions with interesting features. This 
allowed us to "explore" the parameter plane. However, it leaves us with a list of 
data points at irregular positions, which makes the usual 3D mesh plots unsuitable 
(as a mesh plot requires data from a regular grid). This is why we visualize the 
data instead with density plots, using colour to indicate z height. To obtain the 
colours we interpolated between the data point, and for this, we experimented with 
two interpolation algorithms, namely Akima's spline method [Aki96] and the Sibson's 
natural neighbours method [Sib81]. As the former has problems with strongly varying 
curvature (and this is the case here: the data varies more strongly near the phase 
transition than in the interiours of the phases) we used Sibson's method and produced 
Figs. 4 and 7 with the help of the Natgrid implementation [Cla04] of Sibson's 
algorithm. 

Appendix B. Calculating the gradient of the energy with respect to the 
parameter vector 

For use in the gradient-based minimisation we need a fast way to obtain the gradient 
of the energy function E{n). For the following, we assume that the Hamiltonian can 
be written in bond form, 



As before, B is the set of bonds, i. e. of pairs of interacting spins. In many cases 
H a fj is the same for all bonds ab, but having an inhomogencous Hamiltonian is no 
complication. 

As the energy function is given by E(x.) = J2( a b)eB ^ r H a bP( a ,b), hs gradient 
consists of a sum of derivatives of the reduced density matrices 



We shall now derive formulae for the components of the gradient, i.e., for the 
derivatives w. r.t. the different kinds of parameters. 

Appendix B.l. Derivatives w. r. t. the parameters for the local unitaries 

The derivative of a matrix exponential w. r. t. the components of the exponentiated 
matrix (or of linear combinations of these) is a very involved problem. Not only is the 
integral representation of this parametric derivative, though simple, in no way obvious, 
but also is the evaluation of this integral a very non-trivial matter. For a review of 
the history of this problem and current state of knowledge, consult Ref. [NH95]. 

For us, this is the main reason why we do not use the exponentiation of a 
Hermitian matrix for the parametrisation of the local unitaries, but rather the Cayley 
transform of it, for the latter involves only a matrix inverse, whose parametric 
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derivative is expressed by a simple formula: For any invertible square matrix A = A(t) 
that depends diffcrentiably on a real parameter t, we have 



At At 



(B.l) 



(for a proof, see e. g. [P1M]). 

We need n 2 real parameters to parametrise a Hcrmitian n x n matrix, which we 
arrange to form a n x n upper triangular matrix A with real entries in the diagonal, 
complex entries in the upper triangle and zeroes in the lower triangle. A = A + A' is 
now Hermitian and 

U = (it + A + AA (it- A- AA 1 (B.2) 



is unitary. Using Eq. (B.l), we get 
dU 



^(t + U) (\k) <Z| j + (il-A)- 1 , 



(B.3) 



We use this to calculate 
dE 



(zi+A)- 1 hfc) (i\r\\i) m (i + c/t) 



>{£} 



-4,, 



A7 



5^ trff fca 

h:(fc,a)6B 



e:(a,c)e6 



PbT T 



(£4 ® c/ Q ) p ba u b 



dU a 
dA aM 



dUg 

dA aM 

U C ) Pg C (Ug ® C/c) 1 + 



+ (C/ a ® f/ c ) ,3 ac 



where p 6c is the reduced density matrix without application of the local unitarics, i. e. 

he = {u b ®u c y Pbc (u b ®u c ). 



Appendix B. 2. Derivatives w. r. t. the deformation parameters 

For the derivatives w. r. t. the parameters Re d l c s and Im d l c s , we have to take care of 
the normalisation of p ao as it depends on those parameters. We abbreviate the middle 
line of Eq. (12) with p ao and start with using Eq. (19) in order to see that 

TCI 

P gb = }^ a i a k D ab Pgb 
j,k=l 

where (according to Eq. (20)) 

r.r'GS 
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We write the derivative as 

( l P t ={U a ®U b )W Vab \ -^- ^L-\w^ ab (U a ®U b )l (B.4) 

where the term in parentheses becomes 

(tr Pabf 

In order to evaluate dp a b/d^ ]^ jc^, we distinguish three cases, namely (i) c = a, 
(ii) c = b, (hi) c ^ {a, b}. 

Case (i): The only term in the middle line of Eq. (12) that depends on d l a is D J a ^ 
and this only for those terms in the sum, where j = I or k = I. One finds 

rf;{ - = { \ U E at <-Al d i\ \^) © pi + h.c. 

^(im/t ^ 1 ' k=l r lt r[,r' 2 eS 

Case (ii): Analogous: 
-fffr = ( i } ai £ a * k £ 4**54 41 I"*) (rir^| p[\ + H.c, 

9 \lm ) d bs L J fc=l r 2 ,ri,r^eS 

Case (Hi): For c ^ {a,b}, D 1 , is independent of d l c , but p^, is now dependent. 
We get: 



dpab 



1 1 m 



^=1 



0<£ (expi(<I>--<I>l s + ^r-^ S )), r ^ O ^+H.C 
where p l £ is given by Eq. (18). 



r.r'GS 2 

e£V\{a,b,c} 



Appendix B.3. Derivatives w. r. t. the superposition coefficients 

The derivatives w. r. t. Re on and Im ai are found the same way as for the deformation, 
and one gets 

^-rtr~p ab -p ab tv-M 



a - TTRcTT U1 Hab Fab ^ TTRcT 

dnrmpab _ d \i m J a a \im/ Q < 



da ab (trp ab ) 



2 



with 



where 



dp a 



i, 



1 1 m 

. E a fe| z? U))(^U|©^ + H.c 

' fc=l 



L»[ Q 6) \ is defined in Eq. (20). 
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Appendix B.4- Derivatives w. r. t. the phases 
We first introduce 

= i$) ($i with i$) = i r > ci * rir2 i 

res 2 

so that we can write — due to Eq. (21) — Eq. (12) as 

Pab = (U a ® U b ) (W* ab p ab ) (U a ® U b ). 

We have to take care that in case of a symmetrisation according to Sec. 3.4.1 a 
phase can occur more than once in the expression for p a b, and hence, we make use of 
the phase index mapping v(a, b) <E {1, . . . , R} (Sec. 3.4.1), that associates with every 
pair of spins a, 6 a phase matrix Q v [ a ,b)- We write (with r = (ri^)) 

'dW* 



dpab 



tr p a b 



{Ua U b ) 



" QPab + W^Q^ 



(U a 



and proceed to discuss the two derivatives in this expression. 

The first one is evidently non-zero only if v = i/(a, b) and then evaluates to 



d<f>* 



'q.q'GS 2 



" i/(a,b) 

The set notation in the Kronecker deltas accounts for the fact that <3? is symmetrised, 
q>rir 2 — Qr 2 ri ^ a g a j n Sec. 3.4.1) and hence, the order of the components of the 
vectors r, q, and q' must be disregarded. 

For the second term, we pull the derivative inwards 



dpab 



j,k=l 



a,a k D J nh 



dp£ 



then rewrite Eq. (15) as 
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The sum over e formally runs over N — 2 terms. Most of these vanish, however, namely 
all those for which neither v — v(a, e) nor v = v(b, e). For translation- invariant phases, 
the number of remaining terms is of the order of the coordination number of the lattice. 
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The derivative in the last line of the previous equation evaluates to 
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